Dataset: Human Phenotype Ontology Gold Standard(HPO-GS)
Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports
NLP Tasks: Text Classification, Information Extraction, Question Answering
Method: Evaluating the diagnostic performance of three large language models (LLMs), including a custom-built LLM (GPT-4 integrated with the Human Phenotype Ontology [GPT-4 HPO])
Metrics:
- Diagnostic accuracy (GPT-4: 13.1%, GPT-4 HPO: 8.2%, Gemini Pro: 8.2%)
Fine-tuning large language models for rare disease concept normalization
NLP Tasks: Named Entity Recognition, Information Extraction
Method: fine-tuning Llama 2, an open-source large language model (LLM)
Metrics:
- Accuracy (over 99%)
- Accuracy (NAME: 10.2%, NAME+SYN: 36.1% with typos, NAME+SYN: 61.8% with typo-specific fine-tuning)
- Accuracy (NAME: 11.2%, NAME+SYN: 92.7% for unseen synonyms)
Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT
NLP Tasks: Information Extraction, Text Classification, Text Generation
Method: PhenoBCBERT and PhenoGPT models
Metrics:
- Accuracy